X-SOM Results for OAEI 2007

نویسندگان

  • Carlo Curino
  • Giorgio Orsi
  • Letizia Tanca
چکیده

This paper summarizes the results of the X-SOM tool in the OAEI 2007 campaign. X-SOM is an extensible ontology mapper that combines various matching algorithms by means of a feed-forward neural network. X-SOM exploits logical reasoning and local heuristics to improve the quality of mappings while guaranteeing their consistency. 1 Presentation of the system Nowadays, the spreading of data intensive and community-centered web-applications has multiplied the number of available datasources accessible through the Internet. In order to effectively query and integrate this information, a shared formalism should be used, at least as a means to mediate the access to datasources. In many situations, ontologies [8] have demonstrated , to be a suitable formalism for evenly representing the content of heterogeneous datasources [15], with a well-defined semantics. In principle, it is possible to extract an ontology from a datasource, and then integrate its information content with that of other datasources, by relating their respective ontologies. Ontology mapping is then defined as the process of bringing two or more ontologies into mutual agreement, by relating their similar concepts and roles by means of alignment relationships. Generally speaking, the mapping process aims at providing a unified, consistent and coherent view over multiple conceptualizations of one or more domains of interest. In this paper, we briefly describe our ontology mapping tool, X-SOM [5] (eXtensible Smart Ontology Mapper), summarizing the performance obtained against the OAEI 2007 test cases. The architecture of the X-SOM Ontology Mapper is composed by three subsystems: Matching, Mapping and Inconsistency Resolution. The Matching Subsystem is constituted by an extensible set of matching modules, each of which implements a matching technique that may be invoked by the mapper according to a configurable matching strategy; this strategy defines also the way the matching values are combined. Each module receives as input two ontologies and returns a set of matchings, along with a similarity degree, between homogeneous resources (i.e., concepts with concepts, roles with roles and individuals with individuals); the produced structure is called similarity map. All similarity maps produced by the Matching Subsystem are collected by the Mapping Subsystem; the various proposals are then combined by means of a feed-forward neural network in order to produce an aggregated similarity degree, starting from the single similarities computed by each module of the Matching Subsystem. Given these aggregate matching values, the Mapping Subsystem computes a set of candidate mappings by applying, to the set of matchings, a pair of configurable threshold values. The first threshold is called discard threshold; the matchings with a similarity degree lower than it are discarded a-priori. The second threshold is called accept threshold, and the matchings with a similarity degree greater than it are accepted as candidate mappings. The remaining matchings, whose similarity is between the two thresholds, are considered as uncertain and manually evaluated by the user. Mapping two ontologies might produce inconsistencies [12]; for this reason, the set of candidate mappings computed by the Mapping Subsystem is handed to the Inconsistency Resolution Subsystem, responsible for guaranteeing mappings consistency. Moreover, the X-SOM consistency-checking process can be instructed to preserve the semantics of the original ontologies, in terms of concept definitions and relationships among them. The so-obtained mappings capture the consensual knowledge about the domain, i.e., that information which represents an added value for the system, without changing the semantics of the input ontologies and, in turn, without incurring in the need to adapt the applications built upon them. Ontologies are often published on the Web and not accessible for modifications. For this reason, and to preserve the original representations, X-SOM mappings are stored in a separate ontology called mapping ontology. This ontology acts as a “bridge” between the mapped ontologies and can be used to access the global model constituted by source ontologies connected through the mappings. If needed, it is possible to store in the bridge ontology also the concept definitions needed to disambiguate some terms or to solve particular inconsistencies. X-SOM generates subsumption and equivalence mappings between pairs of resources; they are expressed by means of RDFS and OWLS constructs, in order to maintain the highest interoperability of mapping definitions. 1.1 State, purpose, general statement X-SOM has been designed to automatically discover useful relationships among ontological representations with the purpose of enabling ontology-based data integration and tailoring [6]. The theoretical framework used in this work is that of DL ontologies; however, the X-SOM approach is very flexible and we believe that it is possible to extend it to other ontology languages, and even to other data models such as XML and the relational model. X-SOM is part of a wider research project named Context-ADDICT (ContextAware Data Design, Integration, Customization and Tailoring) [1], which aims at the definition of a complete framework able to support mobile users through the dynamic hooking and integration of new, heterogeneous information sources, until a suitable, contextualized portion of the available data is delivered on their devices, in a structured and offhanded way. The whole process is widely based on ontological representations of both the application domain and datasources; this naturally leads to an ontology mapping process that should be as much automatic as possible. 1.2 Specific techniques used In this section we describe, in more detail, the three subsystems that constitute the XSOM architecture. The Matching Subsystem has been designed to be extensible, to allow easy integration of future matching modules. Since this architecture makes experimenting new modules very easy, X-SOM can also be used as a framework for evaluating matching techniques. X-SOM’s matching modules can be roughly classified into three families: – language-based: The modules belonging to this family of algorithms compare resources by analyzing their names, labels and comments. They consider both the lexical and linguistic features as terms of comparison. The lexical modules currently implemented are: the Jaro module, based on Jaro String Similarity [4] and the Levenshtein module based on the Levenshtein string distance; To exploit linguistic similarities, we implemented a WordNet module that uses the WordNet [13] thesaurus, computing some distance measures like the Leacock-Chodorow [11]. – structure-based: These modules are used to compare the structures of the resources’ neighborhoods. In X-SOM, we have implemented a modified version of the GMO (Graph Matching for Ontologies) algorithm [9], used to find structural similarity in ontological representations. Since the GMO algorithm is quite expensive in terms of required computational resources, we implemented a bounded-path matcher called Walk that reaches lower performance while requiring less resources. – semantics-based: The modules belonging to this family implement algorithms that use background, contextual and prior knowledge to compute the similarity degree between two resources. At the moment, only a Google-based algorithm, described in [3], is implemented. The Mapping Subsystem receives as input the set of similarity maps computed by the modules of the Matching Subsystem, and produces a set of candidate mappings to be verified by the Inconsistency Resolution Subsystem. The most challenging issue is how to aggregate all the contributions coming from the various matching modules. In our setting, the problem has been modeled as the estimation of an optimal aggregation function y = W (X) where each component xı ∈ X is the matching degree given by the ı module of the schedule with respect to a pair of resources, and y is the computed aggregate similarity. The Mapping Subsystem is as extensible as the Matching Subsystem previously described; it allows to add new aggregation algorithms to X-SOM, by implementing a simple interface. At the current development state of the prototype, the most effective aggregation algorithm implemented uses a three-layer, feed-forward neural network. The learning algorithm used is a standard back-propagation algorithm with cross-validation; the values for the moment and the learning-rate have been set after empirical evaluation (i.e., over 50.000 runs of the tool). Notice that the task of determining a good aggregation function is, in general, very complicated, since it is not possible to imagine a unique aggregation function that is suitable for every possible alignment situation. Even by supposing a trivial situation where the W function is approximated with a linear function (e.g., a weighted mean), the process of determining the weight of each module implies that the user knows in advance how reliable the various techniques are. Another interesting aspect is how to build a suitable training set for the neural network. In X-SOM, the training set is generated from a manually-aligned pair of ontologies called reference alignment; correct mappings generate a sample with desiderata equal to 1.0, while the others will be set to zero. Moreover, a cleaning process removes: duplicate samples (i.e., similar inputs and same desiderata), conflicting samples (i.e., same inputs but contradictory desiderata) and linearly dependent samples. In the situation of conflicting samples, only the ones with desiderata equal to 1.0 are kept and the reason resides in the way the desiderata are obtained. To determine if, to a set of inputs, should correspond a positive outcome, the trainer looks at the reference alignment. If the given set of inputs is generated by a correct alignment, the outcome is positive (i.e., 1.0) else, it is set to zero. When two conflicting samples are found, the trainer assumes that the one with positive outcome is correct, while the other is discarded. It is possible that, in certain situations, a module be not able to produce a similarity degree for a given pair of resources; in this case, the value is approximated by means of an average over the similarity degrees generated by the other modules belonging to the same family. Once the neural network has produced the aggregate similarity values, X-SOM filters them by means of two configurable thresholds: accept and discard. These thresholds also determine the level of automation of the tool, called behavior, which can be: Fullyautomatic, Conservative or Human-intensive. When X-SOM acts with one of the last two behaviors (i.e., supervised behaviors), it is possible to involve the user in deciding what matchings should be accepted. In particular, with the conservative behavior, all the mapping proposals with a similarity degree between the discard and accept thresholds are submitted to the user to be evaluated. When the user does not agree with a X-SOM proposal about a pair of resources, the network trainer performs additional training steps until the result of the network agrees with the user, thus allowing fine-tuning of the network’s biases. The human-intensive behavior is very similar to the previous one, only it does not discard any mapping a-priori, leaving to the user the freedom to explore all the mappings with a similarity degree under the accept threshold. The Inconsistency Resolution Subsystem takes as input the candidate mappings from the Mapping Subsystem and produces a set of mappings, in which at least all the logical inconsistencies have been solved. Since the input ontologies are supposed to be consistent, consistency resolution is reduced to identifying those mappings that introduce a contradiction into the final model. This problem is faced in X-SOM at two different levels: consistency check and what we have called semantic coherence check. Consistency check locates those mappings that introduce a logical contradiction in the original ontologies. X-SOM uses an extended tableau algorithm to identify the set of mappings responsible for inconsistency and uses a set of heuristic rules, based on the similarity degree, in order to remove those mappings; since the removal of mappings leads to a loss of information, the rules try to preserve as much information as possible, in terms of logical axioms. Also the inconsistency resolution policies are affected by the tool behavior described above. When the tool acts in a supervised behavior, the inconsistent mappings are submitted to the user who selects the correct ones; wrong mappings are then removed automatically. When acting with the fully-automatic behavior, X-SOM removes the less probable mappings using the heuristic rules. By Semantic coherence check we mean the process of verifying whether there are mappings that introduce into the model a semantic incoherence without introducing a logical contradiction into the T-BOX. To better explain what we mean by semantic coherence, let us introduce the notion of local entailment: an entailment A v B, in the global model, is said to be local to an ontology O if it involves only resources of O. By semantic incoherence we mean the situation in which the alignment relationships enable one or more local entailments that were not enabled within the original ontologies. This, in general, is a desirable behavior for systems that exploit ontologies; however, in certain situations, it is possible to introduce an incoherent assertion without introducing a logical contradiction into the model. A simple example of semantic incoherence is the emergence of a cycle of subsumptions after a mapping process, which leads to a collapse of the involved concepts into a unique concept. The collapse of two concepts – which were only in a subclass relationship in the original ontologies – changes the semantics of the representation: for this reason, our algorithm removes the mappings responsible for that behavior. We consider a semantic incoherence as a possible symptom of an inconsistency; since we are interested in developing a high precision ontology mapper, we currently adopt a conservative approach that does not allow any change in the semantics of the original ontologies. The main drawback of this approach is that it is possible to lose some useful inferences on the global model. 1.3 Adaptations made for the evaluation In order to comply with the test-cases proposed in this contest, we made two main adaptations: – External resources: Using the original configuration of X-SOM, external resources (e.g., FOAF definitions) are imported and used in the mapping process. As a result, also the mappings between pairs of external resources are included in the alignment ontology produced by X-SOM. To avoid a wrong computation of performance measures, we artificially removed this kind of mappings from the output of the tool when they were not part of the reference alignment. – Properties comparison: In some reference alignments, datatype properties are compared and aligned with object properties; since this kind of mapping is normally forbidden in X-SOM, we modified the matching algorithms in order to allow this kind of matching. 1.4 Link to the system and parameters file X-SOM is an open-source project, since it also relies on existing implementations of known matching algorithms. To obtain a working copy of the X-SOM prototype, along with the source code, please send an email to [email protected]. 1.5 Link to the set of provided alignments http://home.dei.polimi.it/orsi/xsom-oaei07.zip

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OntoDNA: Ontology Alignment Results for OAEI 2007

OntoDNA is an automated ontology mapping and merging system that utilizes unsupervised data mining methods, comprising of Formal Concept analysis (FCA), Self-Organizing map (SOM) and K-means incorporated with lexical similarity, namely Levenshtein edit distance. The unsupervised data mining methods are used to resolve structural and semantic heterogeneities between ontologies, meanwhile lexical...

متن کامل

TaxoMap in the OAEI 2007 Alignment Contest

This paper presents our first participation in the OAEI 2007 campaign. It describes an approach to align taxonomies which relies on terminological and structural techniques applied sequentially. We performed our method with various taxonomies using our prototype, TaxoMap. Previous experimental results were encouraging and demonstrate the relevance of this alignment approach. In this paper, we e...

متن کامل

Results of the Ontology Alignment Evaluation Initiative 2007

Ontology matching consists of finding correspondences between ontology entities. OAEI campaigns aim at comparing ontology matching systems on precisely defined test sets. Test sets can use ontologies of different nature (from expressive OWL ontologies to simple directories) and use different modalities (e.g., blind evaluation, open evaluation, consensus). OAEI-2007 builds over previous campaign...

متن کامل

CroLOM results for OAEI 2017: summary of cross-lingual ontology matching systems results at OAEI

This paper presents the results obtained in the OAEI 2017 campaign by our ontology matching system CroLOM. CroLOM is an automatic system especially designed for aligning multilingual ontologies. This is our second participation with CroLOM in the OAEI and the results have so far been positive.

متن کامل

Lily results on SEALS platform for OAEI 2011

This paper presents the alignment results of Lily on SEALS platform for the ontology alignment contest OAEI 2011. Lily is an ontology matching system. In OAEI 2011, Lily submited the results for three matching tasks on the SEALS platform: benchmark, anatomy, conference. The specific techniques used by Lily are introduced. The matching results of Lily are also discussed.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007